An Experiment on Visible Changes of Web Pages
نویسندگان
چکیده
Since web pages are created, changed, and destroyed constantly, web databases (local collections of web pages) should be updated to maintain web pages up-to-date. In order to effectively keep web databases fresh, a number of studies on the change detection of web pages have been carried out, and various web statistics have been reported in the literature. This paper considers the issues of web page changes in terms of user visuality. First, we consider the effect of a number of tags that do not make difference in terms of user visuality. We learned that approximately 4.5% of web page changes under the byte-wise comparison were unnecessarily determined. Secondly, we investigated the relationship between ‘TITLE’ tags and ‘BODY’ tags in terms of web page changes. We found out that an inspection of ‘TITLE’ tags could allow users to sufficiently determine the change of web pages, so that we can significantly reduce the comparison time of web pages.
منابع مشابه
تشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملSound, Music and Textual Associations on the World Wide Web
Sound files on the World Wide Web are accessed from web pages. To date, this relationship has not been explored extensively in the MIR literature. This paper details a series of experiments designed to measure the similarity between the public text visible on a web page and the linked sound files, the name of which is normally unseen by the user. A collection of web pages was retrieved from the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006